Aligning language models to follow instructions - nikkie-memos

Aligning language models to follow instructions

https://openai.com/research/instruction-following

https://openai.com/index/instruction-following/

InstructGPT is better than GPT-3 at following English instructions.

RLHF

Alignment tax

汎化性能の劣化、すなわち事前知識の忘却

Replayで対策

事前学習時のデータを用いて汎化性能を維持

γ：Replayをどの程度考慮するか（ハイパーパラメタ）